Let's Stop Pushing the Envelope and Start Addressing It: A Reference Task Agenda for HCI

نویسندگان

Steve Whittaker

Loren G. Terveen

Bonnie A. Nardi

چکیده

We identify a problem with the process of research in the HCI community – an overemphasis on “radical invention” at the price of achieving a common research focus. Without such a focus, it is difficult to build on previous work, to compare different interaction techniques objectively, and to make progress in developing theory. These problems at the research level have implications for practice, too; as researchers we often are unable to give principled design advice to builders of new systems. We propose that the HCI community try to achieve a common focus around the notion of reference tasks. We offer arguments for the advantages of Reference task agenda . . . 2 this approach, as well as considering potential difficulties. We explain how reference tasks have been highly effective in focussing research into information retrieval and speech recognition. We discuss the factors that should be considered in selecting HCI reference tasks and present an example reference task. We conclude with recommendations for steps needed to execute the reference task research agenda, including both required technical research, as well as changes in HCI research community practice. The technical research involves: identification of important user tasks by systematic requirements gathering; definition and operationalisation of reference tasks and evaluation metrics; execution of taskbased evaluation along with judicious use of field trials. Perhaps more important, we also suggest changes in HCI community practice. We must create forums for discussion of common tasks and methods by which people can compare systems and techniques. Only through this can the notion of reference tasks be integrated into the process of research and development, enabling the field to achieve the focus it desperately needs. Reference task agenda . . . 3 1. THE PROBLEMS WITH HCI AS RADICAL INVENTION Research in HCI, particularly as embodied in the CHI conference, focusses largely on novel problems and solutions that push the technology envelope. Most publications describe novel techniques or novel applications of existing techniques. Newman (1994) provides quantitative evidence for this. He compared CHI with five other engineering research fields, such as thermodynamics and aerodynamics, using content analysis to classify abstracts of published papers to identify their research contribution. In other engineering disciplines, over 90% of published research built on prior work, contributing: (1) better modelling techniques (allowing predictions about designs); (2) better solutions (addressing previously insoluble problems); and (3) better tools and methods (to apply models or build prototypes). However, only about 30% of CHI papers fitted these cumulative categories. The majority either reported “radical” solutions (new paradigms, techniques, or applications) or described experience and heuristics relating to radical solutions. 1.1 Radical Invention is not always Effective This analysis strongly suggests that CHI differs from other engineering research disciplines. We offer arguments that the current state of affairs is problematic with respect to two different success criteria. Reference task agenda . . . 4 One criterion consistent with radical invention is technology transfer. One motivation for constant innovation is the example of whole new industries being created by novel user interfaces. Applications like Visicalc and Lotus 123 drove the early PC market, and Mosaic/Netscape led to the Web explosion. In this view, HCI research is an engine room from which novel interaction techniques are snatched by waiting technology companies. There undoubtedly are some success stories according to this crietrion, including collaborative filtering (Communications of the ACM 1997, Goldberg et al., 1992, Hill et al., 1995, Resnick et al., 1994, Shardanand & Maes 1995), UI toolkits and general programming techniques (Rudishill et al., 1996). The early graphical user interfaces developed at Xerox PARC (Smith et al., 1982) successfully combined ideas such as overlapping windows and the mouse that predated the coalescence of the HCI community. These ideas then made their way into the Macintosh and Microsoft Windows systems. Nevertheless, user interfaces with widespread impact generally originated outside the HCI community (Isaacs & Tang, 1996). Visicalc was invented by a business student and a programmer. CAD systems developed from Sketchpad (Sutherland, 1963), and were also independently invented by engineers at Boeing and General Motors (Foundyller, 1984). AOL and Instant Messenger were invented by business people following UNIX and MULTIX precursors. Tim Berners-Lee, the inventor of HTML and the Web, while a computer scientist, was not a member of Reference task agenda . . . 5 the CHI community. The second success criterion is scientific. The radical invention model has not aided the development of a science of HCI. This is a controversial area with acrimonious past debate concerning the scientific basis of HCI (Newell & Card, 1985, Carroll & Campbell, 1986), and extended arguments about the relationship of HCI to psychology and cognitive science. There are isolated pockets of HCI research deriving basic precepts from psychological theories (Card et al., 1983, Gray et al., 1993, Olson & Olson, 1990). However, these papers are in the minority (as is evident from Newman’s analysis), and they may not have major effects on mainstream HCI practice (Landauer, 1995, Newman, 1994). The analysis so far should make it clear why this is so. Consolidation is impossible if everyone constantly is striking off in novel directions. While radical invention is vital to making progress, so too is cumulative research. Concepts must be clarified, tradeoffs determined, key user tasks and requirements described, metrics or critical parameters (Newman, 1997) identified, and modelling techniques constructed. We are simply not doing enough of this type of work. 1.2 What we don’t Know: Requirements, Metrics and Uses of Everyday Technologies One significant problem originating from the absence of cumulative research is the lack of clear understanding of core user tasks, interactive technologies and Reference task agenda . . . 6 techniques. We lack systematic information about tasks that are essential to people’s everyday computing activities: browsing, retrieval and management of Web information; use of email and voicemail; personal information management; and task management. While there are many radical solution attempts in these areas, we lack accepted bodies of knowledge about these everyday computer activities. In many of these areas, while a few initial studies have been conducted, there is no consensus about user tasks, no common view of outstanding issues and problems, and no accepted success metrics. Thus, when addressing these problems, researchers must begin by carrying out their own research to identify requirements, and evaluation metrics. This difficulty is manifest for information retrieval (Amento et al., 1999, Whittaker et al., 1998a), asynchronous communication (Whittaker & Sidner, 1996, Whittaker, Davis, Hirschberg & Muller, 2000), and desktop UIs (Barreau & Nardi, 1995). The absence of shared task information makes it difficult to focus research problems, to compare research results, and determine when a new solution is better, rather than simply different (Newman, 1997). A well-known problem with radical invention is that it often is not based on an 1 By systematic bodies of knowledge, we employ the very weak criterion that at least two studies have been conducted in a given area. Note that we are not even insisting that the studies agree on their core findings. There are often one or two pioneering studies in a given domain, after which no further research is done. Reference task agenda . . . 7 understanding of user tasks and requirements. Researchers thus find themselves proposing radical solutions to problems that are of little interest to users, while neglecting genuine problems. Barreau and Nardi (1995) studied how users organised desktop information. Most people felt that their computer files were adequately organised, and that archiving tasks did not require major support. Nevertheless, much recent technological work has addressed archival support (Fertig et al., 1996, Gifford et al., 1991, Rao et al., 1994). On the other hand, many people experienced problems in transferring information between applications. Here basic empirical investigation uncovered an important task that was not being addressed by the research community. This insight led to work on Apple Data Detectors (Nardi, Miller & Wright, 1998), now a part of the Macintosh operating system. The research also identified a second requirement that desktop organisers should support, namely reminding. Users remembered outstanding tasks by simply inspecting folders and files. This research thus discovered two novel user problems, (and hence criteria for evaluating new versions of desktop organisers), as well as finding that a commonly addressed problem, archiving, actually didn’t deserve as much attention. In addition to a lack of shared task and requirements descriptions, we also have little systematic data about how people use popular technologies. We lack information about how people actually use email, voicemail, cellular phones, the Reference task agenda . . . 8 Windows interface, digital personal organisers, and instant messaging. The popularity of these technologies and their widespread usage make it imperative to know how people use them, what they use them for, how successful they are, and where problems lie. Furthermore, we don’t have a good understanding of why certain core user interface techniques are successful. GUIs are central to the enterprise of HCI, and although we have successful guidelines for building them (Shneiderman, 1982), we lack theoretical understanding of why they are successful (Baecker, 1987, Brennan, 1990). And of course, new radical innovations such as immersive virtual realities, augmented realities, affective computing, and tangible computing make the problem worse. Not only do we not understand these new technologies and their basic operation, we don’t have a clear sense of how much innovation is tolerable or desirable. In sum, although we lack basic understandings of current users, tasks and technologies, the field is encouraged to try out even more radical solutions, without pausing to do the analysis and investigation required to gain systematic understanding. 2 One complicating factor is that some proprietary research has been conducted into these technologies in industrial contexts. Nevertheless we still need publicly available data about technologies used by millions of people multiple times a day. Reference task agenda . . . 9 1.3 How we don’t Know it: the Dissemination Problem Even when a useful body of knowledge exists for a core task, the HCI community does not have institutions and procedures for exploiting this knowledge. We advocate workshops for articulating and disseminating knowledge of core tasks and practices. Changes in community standards, e.g., reviewing guidelines for the CHI conference, and in HCI instruction, are also needed for new practices to take hold. These will allow our suggestions to be institutionalised. 2. THE REFERENCE TASK SOLUTION To address the overemphasis on radical invention and lack of knowledge about important tasks, we propose a modified methodology for HCI research and practice centred on reference tasks. Our proposal has both technical and social practice aspects. We discuss: (1) how reference tasks may be represented and used, and (2) new practices that the HCI community must adopt in order to develop and utilise reference tasks. The goal of reference tasks is to capture and share knowledge and focus attention on common problems. By working on common tasks central to HCI, the community will enjoy these benefits: • Shared problem definitions, datasets, experimental tasks, user requirements and contextual information about usage situations will allow greater research focus; Reference task agenda . . . 10 • Agreement about metrics (e.g., Newman’s (1997) critical parameters) for measuring how well an artifact serves its purpose enables researchers and designers to compare different user interface techniques objectively, and to determine when progress is being made; • Advice to designers will be based on a stronger foundation, namely knowledge about core tasks within a domain and the best techniques for supporting the tasks; • Theory development also will be strengthened; the relationship between core tasks, interface techniques and critical parameters provides the basis for a predictive model. Our proposal partly overlaps Roberts and Moran (1983) and Newman (1997). Roberts and Moran (1983) argue for the use of standard tasks in evaluating word processing applications. Our proposal differs in being independent of a specific application. Newman suggested using critical parameters to focus design on factors that made critical differences to user interface performance. We are motivated by Newman’s original findings (1994) and wish to underscore the importance of critical parameters. However, we offer a broader approach that emphasises the relationship between requirements, reference tasks and metrics. Newman’s account is unclear about the methods by which critical parameters are chosen. Another concern is that metrics may be task-specific rather than general as his approach would seem to imply. Finally, we address the institutional Reference task agenda . . . 11 processes required for the approach to work, in particular, how researchers can jointly identify reference tasks, collect data, analyze tasks, disseminate and make use of shared results. 2.1 Reference Tasks in other Disciplines To motivate our approach, we trace the role of related concepts in speech recognition and information retrieval. Speech Recognition (The DARPA Workshops) Until the late 1980s, speech recognition research suffered from the same problems as HCI research. Researchers focussed on different tasks and datasets, making it difficult to compare techniques and measure progress. Then, DARPA organised an annual workshop where researchers meet for a “bakeoff” to compare system performance on a shared dataset. (Marcus, 1992, Price, 1991, Stern, 1990, Wayne, 1989). A dataset consists of a publicly available corpus of spoken sentences, divided into training and test sentences. The initial task was to recognise the individual sentences in the corpus. There was no dialogue, and there were no realtime constraints. The success metric was the number of correctly recognised words in the corpus. At each workshop, participating groups present and analyse their system performance. The utility of different techniques can thus be quantified – identifying which techniques succeed with certain types of data, utterances or Reference task agenda . . . 12 recognition tasks. All interested researchers get an annual snapshot of what is working, what isn’t, and the overall amount of progress the field is making. And progress has indeed been made. Initial systems recognised small vocabularies (1000 words), had response times of minutes to hours, and high error rates (10%). Current systems recognise much larger vocabularies (100,000 words), operate in real-time, and maintain the same error rate while recognising increasingly complex spoken sentences. Furthermore, as system performance improves, more difficult tasks have been added to the bakeoff set. Early corpora consisted of high quality audio monologues, whereas more recent tasks include telephone quality dialogues. More recent developments include attempts to extend these methods to interactive tasks (Walker et al., 1998). Shared datasets have other benefits independent of the annual bakeoffs. There are now standard ways to report results of research taking place outside bakeoffs. Independent studies now report word error rates and performance in terms of shared datasets, allowing direct comparison with known systems and techniques. Information Retrieval (The TREC Conferences) A core set of tasks and shared data have also successfully driven research in Information Retrieval. The Text REtrieval Conference (TREC), (Voorhees & Harman, 1997, 1998) sponsored by the United States National Institute of Standards and Technology (NIST), is analogous to DARPA speech recognition workshops. Reference task agenda . . . 13 A major goal of TREC is to facilitate cross-system comparisons. The conference began in 1991, again organised as a bakeoff, with about 40 systems tackling two common tasks. These were routing (standing queries put to a changing database, similar to news clipping services), and ad hoc queries (similar to search engine queries). Metrics for evaluation included precision – the proportion of documents a system retrieves that are relevant and recall – the proportion of all relevant documents that are retrieved. More refined metrics, such as average precision (for multiple queries at a standard level of recall), also are used. The field has made major progress over 7 years: average precision has doubled from 20% to 40%. Furthermore the set of TREC tasks is being refined and expanded beyond routing and ad hoc queries. Over the years new tasks have been added, such as interactive retrieval, filtering, Chinese, Spanish, cross-lingual, high precision, very large collections, speech, and database merging. In each case, participants address a common task with a shared dataset. Common tasks and metrics have made it possible not only to compare the techniques used by different systems, but also to compare the evolution of the same system over time (Sparck-Jones, 1998). Similar approaches have been applied successfully in other disciplines such as digital libraries and machine learning. 3. REFERENCE TASKS IN HCI Reference task agenda . . . 14 3.1 Lessons from DARPA and TREC The case studies using shared tasks, metrics, and datasets reveal a number of relevant lessons. First, there are a number of positive outcomes: • They show the essential role of the research community. Researchers defined tasks, produced and shared datasets, and agreed on suitable evaluation metrics. Furthermore, community practices were changed. Groups applied their systems to common tasks and data, then met to present and analyze their results. The bakeoff became a key community event. • The basic task set is continuously refined. Both sets of workshops have added more tasks, increasing task difficulty and realism. This suggests that discovering “ideal” reference tasks will be an iterative collective process. • One unexpected outcome is that system architectures and algorithms have become more similar. In consequence, it has become possible to carry out independent “black-box” evaluations of different modules. In the case of IR, this common architecture has also become a de facto decomposition of the overall retrieval task. • A common architecture and shared datasets allows wider participation. Small research groups can evaluate their techniques on a sub-part of the overall task, without needing to construct complete experimental systems. Several more problematic issues also arise: Reference task agenda . . . 15 • The workshops rely on a bakeoff model, assuming that research results are embodied in working systems that can be evaluated according to objective metrics. But how well will the system-bakeoff model work for HCI? • Are there key HCI results that cannot be implemented, and thus cannot be evaluated as part of a system? Are there alternatives to the bakeoff model? Might we extend the bakeoff model to areas of HCI that are not focussed on systems, e.g., design, methods or requirements analysis? For methods, does ethnomethodological analysis yields better design data than an experiment? When are different methods useful (Gray & Saltzman, 1998)? Furthermore, the bakeoff itself is not strictly necessary although it serves an important social function. We can distinguish different elements of the DARPA/NIST process; shared datasets could be provided without bakeoffs to compare performance. Obviously this would decrease social interaction surrounding the meetings, but it would still allow for direct system comparison. • There are also complex issues concerning interactivity. TREC and DARPA have focussed mainly on non-interactive tasks. Going from simple tasks (with definable objective metrics) to more difficult and realistic tasks is not straightforward. Doing it may require fundamentally different algorithms and techniques. Both workshops have found difficulty in moving towards interactive tasks with subjective evaluation criteria. • Previous evaluations allowed researchers to test systems on existing datasets, Reference task agenda . . . 16 enabling the calculation of objective success measures such as word error rate, precision and recall. Bringing humans into the evaluation (as users, subjects, judges) produces a more complicated, costly, and subjective process. If HCI wants to experiment with the bakeoff model, it must begin precisely where other workshops have experienced problems. • We previously interpreted system convergence positively, but it also may have a negative side. In both workshops, groups sometimes take the strategy of imitating the best system from the previous bakeoff, with additional engineering to improve performance. If this strategy is generally followed, the overall effect is to reduce research diversity, which may mean that techniques do not generalise well to novel problems. It is therefore critical that reference tasks sets are continually modified and made more complex to prevent “overlearning” of specific datasets and tasks. We do not yet have solutions for these issues. Instead, we view them as cautions that must be kept in mind as we experiment with reference tasks. Criteria for Selecting Reference Tasks How then do we choose appropriate reference tasks for HCI? Candidate reference tasks need to be important in everyday practice. A task may be “important” for different reasons: • Real – first, tasks must also be “real”, that is, not divorced from actual user practice. Reference task agenda . . . 17 • Frequent a task should be central to multiple user activities, so that addressing it will have general benefits. An example here might be processing asynchronous messages. Given the centrality of communication for many user activities, improved ways to manage messages will have widespread benefits; • Critical other tasks may occur less frequently, yet require near-perfect execution. Safety critical applications such as air traffic control are the prime example. These criteria cannot be determined by researchers’ intuitions: significant empirical investigations of user activity are needed. We believe the following areas are worthy of intense study and are likely to yield reference tasks: • information browsing, retrieval, and management; • task management; • information sharing; • computer mediated communication; • document processing; • image processing and management; • financial computation. In selecting reference tasks, we also must avoid obsolescence. While radical inventions cannot be anticipated, we should exclude tasks that may become Reference task agenda . . . 18 unimportant, or be transformed radically through predictable technological progress. Our goals in defining reference tasks include generating shared requirements, accepted task definitions, descriptive vocabulary, task decomposition, and metrics. Common task definitions are critical for researchers to determine how other research is related to their effort. We will discuss how reference tasks are to be defined, and give an illustrative example. First, however, we think it is worthwhile to discuss potential drawbacks of our approach. Potential Objections to our Proposal One potential problem is that HCI research may shift from innovation to become merely a “clean up” operation, directed solely at improving existing tasks, techniques, and applications. However, the areas of information retrieval and speech recognition provide hopeful counter-examples. Developments in speech recognition have led to successful applications to novel and important problems such as searching speech and video archives – and TREC has added tasks in these areas (Voorhees & Harman, 1997, 1998). Furthermore, a shift away from innovation may be necessary: the history of science and technology indicates that many major inventions required a critical mass of innovators producing multiple versions of a given technology before its successful uptake (Marvin, 1988). Working in a radical invention mode precisely fails to achieve critical mass and thus the repeated solution attempts needed for Reference task agenda . . . 19 adoption. Again, we are not declaring a moratorium on radical invention, just arguing for a different emphasis – HCI needs more “normal science” is needed (Kuhn, 1996). There is also the danger of adopting a faulty paradigm. Progress in a field is severely limited when it is based on commonly accepted assumptions, but these assumptions are flawed. Cognitive Science and Artificial Intelligence have seen much lively debate over foundational assumptions (Dreyfus 1992, Ford & Pylyshyn 1995, Harnad 1990, Searle 1981). The notion of representation that was taken for granted in symbolic AI has been attacked (Bickhard & Terveen 1995). Similar arguments have been offered in the speech community. When noninteractive tasks and the sole performance metric of word error rate where central, techniques based on Hidden Markov models were popular. However, these techniques do not generalise well to “non-standard” situations such as hyperarticulation (Oviatt, 1996) or speech in noisy environments (Junqua, 1999). We do not believe the reference task approach runs this risk, however. Instead of proposing new assumptions or a new theory, we are suggesting a modified methodology, with more attention being paid to existing tasks. And note that completely radical solutions are consistent with our approach; they just need to be made relevant to a reference task and followed up by systematic analysis. We need a more rigorous understanding of the core conceptual territory of HCI so that we can better understand the role of radical innovations. Reference task agenda . . . 20 A variant of this last argument is that reference tasks induce bias towards the quantifiable, and a concurrent blindness to more subtle considerations. Much recent HCI work has shown how factors that are not easily quantified, such as ethical issues (Nardi et al., 1996, Nardi and O’Day, 1999) and social relationships among various stakeholders (Grudin, 1988, Orlikowski, 1992), affect the success of interactive technologies. From a design perspective, aesthetic issues also have a substantial impact on the success of applications (Laurel, 1990). Nevertheless, the reference task approach is neutral with respect to such factors. Insofar as factors are crucial to user performance and satisfaction in a given task, successful reference task definitions naturally must incorporate them. Many of these issues also may relate to subjective user judgements. Our later discussion on metrics addresses the role of subjective measures such as user satisfaction. Our hope is to discover systematic ways that users make decisions about interfaces. By defining appropriate methods to elicit this information, we can address this problem. 4. HOW TO DEFINE A REFERENCE TASK We adopt the activity theory view that a task is a conscious action subordinate to an object (Kaptelinin, 1996). Each action, or task, supports some specific object such as completing a research paper, building an aeroplane or curing a patient. The object in these cases is the paper, the sale, the aeroplane, the patient. The Reference task agenda . . . 21 tasks are performed to transform the object to a desired state (complete paper, closed sale, functioning aeroplane, healthy patient). The same tasks can occur across different objects; thus, the task of outlining could be useful in writing a book, preparing legal boilerplate, or specifying a product. In studying reference tasks it is important to determine the object of tasks so that appropriate customisations can be offered. While there might be a generic “outlining engine,” outlining a product specification could entail special actions that require customising the basic engine. Keeping the object in mind will bring designs closer to users’ requirements. We also need empirical work to determine good domains for investigating candidate reference tasks. Of the many tasks involving computers, we must identify tasks satisfying our criteria of frequency and criticality. Defining a reference task may begin with an analysis of prior relevant work. All to often, each individual research effort defines its own problem, requirements, and (posthoc) evaluation metrics. However, by analysing a broad set of related papers, one can abstract common elements: • What are the user requirements in this area? Are they based on solid empirical investigation? Often the answer is no – which means more empirical studies of user activity are needed. • Is there a common user task (or set of tasks) that is being addressed? Reference task agenda . . . 22 • What are the components of the task(s)? Is a task decomposition given, or can one be abstracted from various papers? • What is the range of potential solution techniques? What problems do they address, and what problems are unsolved? Are there problems in applying various techniques (do they require significant user input, scaling, privacy or security concerns)? • How are solution techniques evaluated? Are metrics proposed that generalise beyond the single originating study? This last issue is crucial – it captures Newman’s (1997) “critical parameters” that define the artifact’s purpose and measure how well it serves that purpose. If researchers abstract tasks from related work, they may be personally satisfied with the result. But other researchers may have different perspectives on all task aspects. For this reason, important community practices need to be introduced. Representative researchers and practitioners concerned with a particular area need to meet to discuss, modify, and approve the reference task definition. This would be like a standards committee meeting, although faster and more lightweight. Such groups might meet in the CHI workshops programme or in government sponsored workshops, organized by NIST or DARPA, for example. After a reference task is approved, its definition would be published, e.g. in the SIGCHI Bulletin and Interactions, with the complete definition appearing on the Web. But agreed reference task definitions also need to be modifiable, as researchers and Reference task agenda . . . 23 practitioners experiment with them. One might use the NIST TREC model in where tasks are discussed annually, with modifications being made in the light of feedback. Finally, the community must reinforce the important role of the shared knowledge embodied in reference tasks. Educational courses must show how tasks are defined, and the benefits from using this knowledge, as well as emphasising the problems that the reference task approach addresses. And the CHI review process could be modified so that reviewers explicitly rate papers with reference to the reference task model. 5. AN EXAMPLE REFERENCE TASK: BROWSING AND RETRIEVAL IN SPEECH ARCHIVES We now discuss an example reference task: browsing and retrieval in speech archives. It is intended to illustrate (a) identifying reference tasks; (b) using them to evaluate and improve user interfaces, and (c) issues arising in this endeavour. We summarise work reported in recent research papers (Choi et al., 1998, Nakatani, et al., 1998, Whittaker et al., 1998a, Whittaker et al., 1998b, Whittaker et al., 1998c, Whittaker et al., 1999, Whittaker et al., 2000). Other areas would have served equally well in illustrating reference tasks; we selected this area simply because of our personal expertise in this domain. Reference task agenda . . . 24 5.1 Selecting and Specifying Reference Tasks in the Domain of Speech Archives Two criteria we proposed earlier for selecting a reference task were that the task is either frequent or critical. So what is the evidence that accessing speech data is an important user task? Conversational speech is both frequent and central to many everyday workplace tasks (Chapanis, 1975, Kraut et al., 1993, Whittaker et al., 1994). Voice messaging is a pervasive technology at work and at home, with both voicemail and answering machines requiring access to stored speech data. In the US alone, there are over 63 million voicemail users. New areas of speech archiving are also emerging, with television and radio programs becoming available on-line. These observations indicate that searching and browsing speech data meet the criteria of being frequent, general and real. Furthermore, we will show that the tasks we identify in speech retrieval generalise to retrieval of textual data, making it possible to use them more widely. But identifying the area of speech retrieval does not identify specific user tasks when accessing speech archives. We therefore collected several different types of data concerning people’s processing of voicemail. We chose to examine voicemail access rather than other audio data such as news, because voicemail is currently the most pervasive speech access application. We collected qualitative and quantitative data for a typical voicemail system, Audix: (a) server logs; (b) Reference task agenda . . . 25 surveys from high volume users; (c) interviews with high volume users. We also carried out laboratory tests to confirm our findings on further users. We found three core speech access tasks: (a) search, (b) information extraction, and (c) message summarisation. Search is involved in prioritising incoming new messages, and for locating valuable saved messages. Our working definition of search is: given a set of messages, identify a (small) subset of messages having relevant attributes with certain values (for example being from a particular person or being about a particular topic). Information extraction involves accessing information from within messages. This is often a laborious process involving repeatedly listening to a message for verbatim facts such as caller’s name and phone number. Our definition of information extraction is: given a (set of) message(s) and a set of relevant attributes, identify the values associated with those attributes. A final task at the message level is summarisation: to avoid repeatedly replaying messages users attempt to summarise their contents, usually by taking handwritten notes, consisting of a sentence or two describing the main point of the message. We define summarisation as involving selection of a subset of information from within the document that best captures the meaning of the entire document. For more formal definitions of summarisation we refer the reader to Sparck-Jones (1998). These three tasks were generated by analysis of voicemail user data. Nevertheless, although they originated from speech data, we found analogues in the Reference task agenda . . . 26 independently generated TREC textual retrieval tasks. The fact that these three tasks are common to searching both speech and text is encouraging for the reference task approach. It argues that there may be general search tasks that are independent of media type.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Health Policy Process in Vietnam: Going Beyond Kingdon’s Multiple Streams Theory; Comment on “Shaping the Health Policy Agenda: The Case of Safe Motherhood Policy in Vietnam”

This commentary reflects upon the article along three broad lines. It reflects on the theoretical choices and omissions, particularly highlighting why it is important to adapt the multiple streams framework (MSF) when applying it in a socio-political context like Vietnam’s. The commentary also reflects upon the analytical threads tackled by Ha et al; for instance, it highlights the opportunitie...

متن کامل

Impact of hand forces and start/stop frequency on physiological responses to three forms of pushing and pulling: a South African perspective.

There has been limited attention given to the physiological demands of pushing and pulling, especially in industrially developing countries such as South Africa. Two key factors affecting the physiological demands of these tasks are the hand forces exerted and the start/stop frequency. The purpose of the current study was therefore to investigate the physiological responses to pushing and pulli...

متن کامل

Health Rights and Realization; Comment on “Rights Language in the Sustainable Development Agenda: Has Right to Health Discourse and Norms Shaped Health Goals?”

In their hypothesis published in IJHPM, Lisa Forman and colleagues examined the prominence of the right to health and sexual and reproductive health rights (as well as related language) in four of the key reports that fed into the process of negotiating the Sustainable Development Goals (SDGs). Now that the SDGs have been formally adopted, this comment builds on some of the insights of Forman a...

متن کامل

Nurturing Societal Values in and Through Health Innovations; Comment on “What Health System Challenges Should Responsible Innovation in Health Address?”

Aligning innovation processes in healthcare with health system demands is a societal objective, not always achieved. In line with earlier contributions, Lehoux et al outline priorities for research, public communication, and policy action to achieve this objective. We endorse setting these priorities, while also highlighting a ‘commitment gap’ in collectively addressing system-level challenges....

متن کامل

Sustaining Health for Wealth: Perspectives for the Post-2015 Agenda; Comment on “Improving the World’s Health Through the Post-2015 Development Agenda: Perspectives From Rwanda”

The sustainable development goals (SDGs) offer a unique opportunity for policy-makers to build on the millennium development goals (MDGs) by adopting more sustainable approaches to addressing global development challenges. The delivery of health services is of particular concern. Most African countries are unlikely to achieve the health MDGs, however, significant progress has been made particul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Human-Computer Interaction

دوره 15 شماره

صفحات -

تاریخ انتشار 2000

Let's Stop Pushing the Envelope and Start Addressing It: A Reference Task Agenda for HCI

نویسندگان

چکیده

منابع مشابه

The Health Policy Process in Vietnam: Going Beyond Kingdon’s Multiple Streams Theory; Comment on “Shaping the Health Policy Agenda: The Case of Safe Motherhood Policy in Vietnam”

Impact of hand forces and start/stop frequency on physiological responses to three forms of pushing and pulling: a South African perspective.

Health Rights and Realization; Comment on “Rights Language in the Sustainable Development Agenda: Has Right to Health Discourse and Norms Shaped Health Goals?”

Nurturing Societal Values in and Through Health Innovations; Comment on “What Health System Challenges Should Responsible Innovation in Health Address?”

Sustaining Health for Wealth: Perspectives for the Post-2015 Agenda; Comment on “Improving the World’s Health Through the Post-2015 Development Agenda: Perspectives From Rwanda”

عنوان ژورنال:

اشتراک گذاری